System and method to efficiently approximate the term 2x

ABSTRACT

The present invention relates to a system and method to efficiently approximate the term 2 X . The system includes an approximation apparatus to approximate 2 X , wherein X is a real number. The system further includes a memory to store a computer program that utilizes the first approximation apparatus. The system also includes a central processing unit (CPU) that is cooperatively connected to the approximation apparatus and the memory, and that executes the computer program.

BACKGROUND

[0001] (1) Field

[0002] The present invention relates to a system and method toefficiently approximate the term 2^(X).

[0003] (2) General Background

[0004] The function, f(X)=2^(X) where X is a real number, is a commonoperation in computational applications. The term 2^(X) can be expressedusing the Taylor series as follows: $\begin{matrix}{2^{X} = {\sum\limits_{N = 0}^{\infty}\quad \frac{\left( {x\quad \ln \quad 2} \right)^{\underset{\_}{N}}}{N!}}} & (1)\end{matrix}$

[0005] Accordingly, 2^(X) can be approximated by adding the first fewelements of the sum expressed in equation (1). Since X is a real number,the process of approximating the term 2^(X) would involve floating-pointoperations. Because operations on floating-point data are costly innature, an efficient technique of approximating 2^(X) would proveuseful.

BRIEF DESCRIPTION OF THE DRAWINGS

[0006]FIG. 1 is a block diagram of a computing system in accordance withone embodiment of the present invention;

[0007]FIG. 2 shows a single precision floating-point data structure, asdefined by the IEEE Standard for Binary Floating-Point Arithmetic, IEEEstd. 754-1985, published Aug. 12, 1985;

[0008]FIG. 3A is a block diagram of an apparatus to approximate the term2^(X) in accordance with one embodiment of the present invention;

[0009]FIG. 3B is a block diagram of an apparatus to approximate the termC^(Z), where C is a constant and a positive number and Z is a realnumber;

[0010]FIG. 4A shows the structure of the shifted └X┘_(integer) valuethat is the output of shift-left logical operator of FIG. 3A inaccordance with one embodiment of the present invention;

[0011]FIG. 4B illustrates the structure of the approximation of 2^(ΔX)represented in a single precision floating-point data format as definedby IEEE Standard 754;

[0012]FIG. 5 is a block diagram of an apparatus for rounding realnumbers using the “floor” or “rounding toward minus infinity (−∞)”technique in accordance with one embodiment of the present invention;

[0013]FIG. 6, this figure is a block diagram of an apparatus forapproximating the term 2^(ΔX) using the first five (5) elements of theTaylor series in accordance with one embodiment of the presentinvention; and

[0014]FIG. 7 is a flow diagram that generally outlines the process ofapproximating 2^(X) in accordance with one embodiment of the presentinvention.

DETAILED DESCRIPTION

[0015] The present invention relates to a system and method toefficiently approximate the term 2^(X).

[0016]FIG. 1 is an exemplary block diagram of a computing system 100 inaccordance with one embodiment of the present invention. Computingsystem 100 includes a central processing unit (CPU) 105 and memory 115that is cooperatively connected to the CPU 105 through data bus 125. CPU105 is used to execute a computer program 110, which is stored in memory115 and utilizes 2^(X)-approximation apparatus 120.

[0017] CPU 105 is cooperatively connected to approximation apparatus 120through data bus 130. Approximation apparatus 120 generally accepts asin put a real number (X) which is represented in floating-point format,approximates the term 2^(X), and outputs the approximation of 2^(X). Theapproximation of 2^(X) is a real number represented in floating-pointformat.

[0018]FIG. 2 shows a single precision floating-point data structure 200,as defined by the IEEE Standard for Binary Floating-Point Arithmetic,IEEE std. 754-1985, published Aug. 12, 1985. A floating-point format isgenerally a data structure specifying the fields that comprise afloating-point numeral, the layout of those fields, and the arithmeticinterpretation of those fields.

[0019] The single precision floating-point data structure 200 includesthree fields: a mantissa (M) 205, an exponent (E) 210, and a sign (S)215. In one embodiment, these three fields are stored contiguously inone 32-bit word, with bit 0 being the least significant bit and bit 31being the most significant bit.

[0020] Bits 0 to 22 contain the 23-bit mantissa 205. Mantissa 205generally contains the fractional portion of a real number, and issometimes called the fraction. It should be noted that real numbers areusually normalized so that the most significant digit of the mantissa205 is non-zero, allowing the mantissa 205 to contain the maximumpossible number of significant digits.

[0021] It should also be noted that a mantissa, as defined by IEEEStandard 754, assumes a leading one (1). For example, a real number of(1.YYYYYY×2^(M))—where 1.YYYYYY is a real number represented in base two(2), Y represents a digit in the fractional portion of the real number1.YYYYYY, and M is a positive integer—is represented by a floating-pointvalue in which YYYYYY is stored in the mantissa of the floating-pointvalue, and M is stored in the exponent of the floating-point value. Bits23 to 30 contain the 8-bit exponent 210. Exponent 210 is generally abinary integer representing the base two power to which the mantissa orfraction is raised. Exponents are typically represented in a biasedform, in which a constant is added to the actual exponent so that thebiased exponent is always a positive number. The value of the biasingconstant depends on the number of bits available for representingexponents in the floating-point format being used. For example, the biasconstant is 127 for the single precision floating-point format asdefined by IEEE Standard 754.

[0022] Bit 31 is the sign bit 215. Sign bit 215 indicates whether therepresented real number is positive or negative.

[0023] It should be noted that there are other floating-point formats,e.g., double precision, double extended precision, or the like.Descriptions of embodiments in accordance with the present inventionwill discuss only IEEE single precision floating-point format forillustrative purposes and to avoid obscuring the present invention.However, the embodiments described below can be modified in accordancewith inventive principles of the present invention to supportfloating-point formats other than the IEEE Standard 754 single precisionfloating-point format.

[0024]FIG. 3A is an exemplary block diagram of an apparatus 300 toapproximate the term 2^(X) in accordance with one embodiment of thepresent invention. Approximation apparatus 300 generally receives asinput a real number (X) 314 which is represented in floating-pointformat, approximates the term 2^(X), and outputs an approximation 316 of2^(X). The approximation 316 of 2^(X) is represented in floating-pointformat.

[0025] The real number (X) 314 can be approximated using the followingequation (2):

X=└X┘+ΔX,  (2)

[0026] where └X┘ is value of X rounded towards minus infinity (−∞) andΔX is the fractional portion of X.

[0027] As a result, 2^(X) can be approximated using the followingequation (3):

2^(X)=2^(└X┘+ΔX)=2^(└X┘)×2^(ΔX)  (3)

[0028] Approximation apparatus 300 generally computes an approximation316 of 2^(X) using the principles set forth in the aforementionedequations (2) and (3). Approximation apparatus 300 includes roundingapparatus 302, integer-to-floating-point (INT-TO-FP) converter 304,floating-point subtraction operator 306, 2^(ΔX)-approximation apparatus308, shift left logical operator 310, and integer addition operator 312.

[0029] Rounding apparatus 302 is a device for rounding a real numberusing technique of “floor” or “rounding toward minus infinity (−∞)”. Inthe “floor” or “rounding toward minus infinity (−∞)” technique, a realnumber would generally be rounded to the next integer. For example, +1.5would be rounded down to 1, and −1.90 would be rounded down to −2. Adetailed description of rounding apparatus will be provided below inFIG. 5 and the text accompanying the figure.

[0030] Rounding apparatus 302 generally accepts as input a real number(X) 314, rounds the input value 314 down to the next integer inaccordance with the technique of “floor” or “rounding toward minusinfinity (−∞)”, and returns the rounded value, and outputs the roundedvalue, └X┘_(integer) 318. └X┘_(integer) 318 is represented in a standardinteger format.

[0031] INT-to-FP converter 304 generally converts an integer representedin a standard integer format to an integer represented in floating-pointformat INT-to-FP converter 304 receives └X┘_(integer) 318 as input, andreturns └X┘_(floating-point) 320. └X┘_(floating-point) 320 essentiallyis the value of X rounded down to the next integer in accordance withthe “floor” or “rounding toward minus infinity (−∞)” technique. Unlike└X┘_(integer) 318, └X┘_(floating-point) 320 is represented infloating-point format.

[0032] Floating-point subtraction operator 306 has two operands 322,324, subtracts the second operand (OP2) 324 from the first operand (OP1)322, and returns the result of the subtraction. X 314 is fed intofloating-point subtraction operator 306 as the first operand (OP1) 322of the operator 306. └X┘_(floating-point) 320 is fed into floating-pointsubtraction operator 306 as the second operand (OP2) 324 of theoperator. Floating-point subtraction operator 306 subtracts└X┘_(floating-point) 320 from X 314, and returns the result 328, denotedby ΔX. ΔX 328 is represented in floating point format, and is fed into2^(ΔX)-approximation apparatus 308.

[0033] Approximation apparatus 308 receives ΔX 328 as input, computes anapproximation 330 of 2^(ΔX), and outputs the approximation 330. Adetailed description of approximation apparatus 308 will be providedbelow in FIG. 6 and the text accompanying the figure.

[0034] Shift-left logical operator 310 has two operands 332, 334,performs a bit-wise left shift of the first operand (OP1) 332 by apredetermined number of bit positions specified by the second operand(OP2) 334, and outputs the shifted └X┘_(integer) value 336.└X┘_(integer) 318 is fed into shift-left logical operator 310 as OP1 332of the operator 310. A shift-left bit count 338 is fed into shift-leftlogical operator 310 as the second operand (OP2) 334 of the operator310. Shift-left logical operator 310 shifts └X┘_(integer) 318 to theleft by the shift-left bit count 338. Shift-left logical operator 310 isused to shift the └X┘_(integer) value to the left by a predeterminednumber of bit positions so that the value occupies bit positionsreserved for the exponent of a floating-point value.

[0035] In one embodiment supporting single precision floating-point dataformat as defined by IEEE Standard 754, shift-left bit count 338 has avalue of twenty-three (23), and is fed into shift-left logical operator310 as OP2 334 of the operator 310. Accordingly, shift-left logicaloperator 310 shifts └X┘_(integer) 318 to the left by twenty-three bits,and outputs the shifted └X┘_(integer) value 336.

[0036] It should be noted that shift count value may not be twenty-three(23) in alternative embodiments that support floating-point formats,e.g., double precision, double extended precision, or the like.

[0037]FIG. 4A shows the structure 400 of the shifted └X┘_(integer) valuethat is outputted by shift-left logical operator 310 of FIG. 3A. Theshifted └X┘_(integer) value has a 32-bit structure 400, with bit 31being the most significant bit and bit 0 being the least significantbit. Bits 23 to 31 of the shifted └X┘_(integer) value will essentiallycontain └X┘_(integer) 402. Bits 0 to 22 of shifted └X┘_(integer) valuewill collectively have a value of zero (0) 404.

[0038] Returning to FIG. 3A, integer addition operator 312 has twooperands (OP1 and OP2) 340, 342. Integer addition operator 312 treatsthe two operands 340, 342 as integers represented in a standard integerformat, performs an integer addition operation to add OP2 342 to OP1340, and outputs the sum 316. The sum 316 that integer addition operator312 outputs is essential an approximation 316 of 2^(X).

[0039] The shifted └X┘_(integer) value 336, which is the output ofshift-left logical operator 310, is fed into integer addition operator312 as the first operand (OP1) 340 of the operator 312. The structure ofshifted └X┘_(integer) value is shown above in FIG. 4A and thedescription of the figure.

[0040] The approximation 330 of 2^(ΔX) is represented in floating-pointformat, and is fed into integer addition operator 312 as the secondoperand (OP2) 342 of the operator 312.

[0041]FIG. 4B illustrates the structure 410 of the approximation of2^(ΔX) represented in a single precision floating-point data format asdefined by IEEE Standard 754. Bits 23 to 31 contain the exponent 412 ofthe approximation of 2^(ΔX), and bits 0 to 22 contain the mantissa 414of the approximation of 2^(ΔX).

[0042] Returning to FIG. 3A, integer addition operator 312 treats theshifted └X┘_(integer) value 336 and the approximation 330 of 2^(ΔX) asintegers, and performs an integer addition operation to compute theirsum, even though the approximation 330 of 2^(ΔX) is representedfloating-point format. As a result, bits 23 to 31 of the shifted└X┘_(integer) value 336 are added to bits 23 to 31 of the approximationof 2^(ΔX), and bits 0 to 22 of the shifted └X┘_(integer) value 336 areadded to bits 0 to 22 of the approximation 330 of 2^(ΔX).

[0043] As stated above, bits 23 to 31 of the shifted └X┘_(integer) value336 contain └X┘_(integer), and bits 0 to 22 of the shifted └X┘_(integer)value 336 contain a value of zero (0). Therefore, └X┘_(integer) is addedto the exponent of the approximation 330 of 2^(ΔX), and zero (0) isadded to the mantissa of the approximation 330 of 2^(ΔX). By adding└X┘_(integer) to the exponent of the approximation 330 of 2^(ΔX),integer addition operator 312 is effectively computing an approximation316 of 2^(ΔX) given the shifted └X┘_(integer) value and theapproximation 330 of 2^(ΔX) in accordance to equation (3), which isdescribed above. Accordingly, the output of integer addition operator312 is an approximation 316 of 2^(X).

[0044] In summary, approximation apparatus 300 computes an approximationof 2^(X) using the principles set forth in equations (2) and (3).

[0045]FIG. 3B is an exemplary block diagram of an apparatus 345 toapproximate the term C^(Z), where C is a constant and a positive numberand Z is a real number, in accordance with one embodiment of the presentinvention. The term C^(Z) can be generally expressed using the followingequation (4):

C ^(z)=2^((log2C)×Z)  (4)

[0046] Approximation apparatus 345 generally receives as input a realnumber (Z) 358 which is represented in floating-point format,approximates the term C^(Z) in accordance with equation (4), and outputsthe approximation 362 of C^(Z). The approximation 360 of C^(Z) isrepresented in floating-point format.

[0047] Approximation apparatus 345 includes floating-pointmultiplication operator 350 and 2^(X) approximation apparatus 300.Floating-point multiplication operator 350 has two operands (OP1 andOP2) 352, 354. Operator 350 performs a floating-point multiplication tocompute the product of OP1×OP2, and outputs the product.

[0048] The real number Z 356 is fed into floating-point multiplicationoperator 350 as the first operand (OP1) 352 of the operator 350. A value358 of log₂ C is fed into floating-point multiplication operator 350 asthe second operand (OP2) 354 of the operator 350. The value 358 of log₂C is represented in floating-point format. Floating-point multiplicationoperator 350 multiplies the real number Z 356 by the value 358 of log₂ Cand outputs the product 360 of (log₂ C)×X.

[0049] Apparatus 300 is shown in FIG. 3A and described in the textaccompanying the figure. As shown in FIG. 3B, the product 360 of(log₂C)×X. Accordingly, apparatus 300 returns an approximation of C^(Z).

[0050] Turning now to FIG. 5, this figure is a block diagram of anapparatus 500 for rounding real numbers using the “floor” or “roundingtoward minus infinity (−∞)” technique in accordance with one embodimentof the present invention. Rounding apparatus 500 generally accepts areal number (X) 514 as input, rounds the input value 514 down to thenext integer, and outputs the rounded value, └X┘_(integer).└X┘_(integer) is represented in a standard integer format.

[0051] Rounding apparatus 500 includes a floating-point to integer(FP-to-INT) converter 502, an integer to floating-point (INT-to-FP)converter 504, floating-point subtraction operator 506, “less than” or“<” operator 508, “AND” operator 510, and integer subtraction operator512.

[0052] FP-to-INT converter 502 generally converts a real number to aninteger represented in a standard integer format. FP-to-INT converter502 performs the conversion by truncating the fractional portion of thereal number. Rounding apparatus 500 uses FP-to-INT converter 502 tocompute the integral portion 516 of input value 514. Input value 514 isa real number represented in floating-point format, and is fed intoFP-to-INT converter 502. FP-to-INT converter 502 returns the integralportion 516 of input value 514. The integral portion 516 is representedin a standard integer format.

[0053] INT-to-FP converter 504 generally converts an integer representedin a standard integer format to an integer represented in floating-pointformat. The output 516 of FP-to-INT converter 502 is fed into INT-to-FPconverter 504. INT-to-FP converter 504 returns the integral portion 518of input value. The integral portion 518 is represented infloating-point format.

[0054] Floating-point subtraction operator 506 has two operands 520, 522and subtracts the second operand (OP2) 522 from the first operand (OP1)520. Rounding apparatus 500 uses floating-point subtraction operator 506to compute the fractional portion of input value 514. Input value 514 isrepresented in floating-point format and is fed into floating-pointsubtraction operator 506 as the first operand (OP1) 520 of the operator506. It should be noted that the value that is fed into OP1 520 is thesame value that is fed into FP-to-INT converter 502.

[0055] The output 518 of INT-to-FP converter 504 is fed intofloating-point subtraction operator 506 as the second operand (OP2) 522of the operator 506. As stated above, the output 518 of INT-to-FPconverter 504 is essentially the integral portion of input value, and isrepresented in floating-point format. Floating-point subtractionoperator 506 subtracts OP2 522 from OP1 520 to compute the fractionalportion 524 of input value 514. This fractional portion 524 isrepresented in floating-point format.

[0056] “<” or “less-than” comparator 508 has two operands 526, 528,performs a comparison of these two operands 526, 528, and returns aboolean value 540 of TRUE or FALSE. If the first operand (OP1) 526 isless than the second operand (OP2) 528, “<” comparator 508 returns aboolean value 540 of TRUE. Otherwise, “<” comparator 508 returns aboolean value 540 of FALSE.

[0057] In one embodiment, TRUE is represented by a 32-bit mask, in whicheach bit of the mask has a value of “1”. In this embodiment, FALSE isrepresented by a 32-bit mask, in which each bit of the mask has a valueof “0”. However, it should be noted that a 32-bit mask is used tosupport single precision floating point format as defined by IEEEStandard 754. Accordingly, a mask that is longer or shorter thanthirty-two (32) bits would be used to support a floating-point formatthat is different than the single precision format. As an example, a64-bit mask would be used to support a double precision floating-pointformat as defined by IEEE Standard 754. As another example, an 80-bitmask would be used to support an extended double precisionfloating-point format as defined by IEEE Standard 754.

[0058] The output 524 of SUBTRACT operator 520 is fed into “<”comparator 508 as the first operand (OP1) 526 of the comparator 508. Areal value 530 of 0.0 is fed into “<” comparator 508 as the secondoperand (OP2) 528 of the comparator 508. As stated above, the output ofSUBSTRACT operator 508 is essentially the fractional portion 524 ofinput value 514, and is represented in floating-point format.Accordingly if the fractional portion 524 of input value 514 is lessthan 0.0, “<” comparator 508 returns a TRUE. Otherwise, “<” comparator508 returns a FALSE.

[0059] “AND” operator has two operands 532, 534, and returns a bit-wiselogical AND of the two operands 532, 534. Rounding apparatus 500 uses“AND” operator 510 to generate an adjustment value of 0 or 1. Theadjustment value 536 is represented in floating-point format, and issubtracted from the integral portion of input value to appropriatelyround the input value 514 in accordance with the “floor” or “rounding tominus infinity (−∞)” technique.

[0060] An integer value 538 of 1 is fed into “AND” operator 510 as thefirst operand (OP1) 532 of the operator 534. The output 540 of “<”comparator 508 is fed into “AND” operator 510 as the second operand(OP2) 534 of the operator 510. As stated above, the output 540 of “<”comparator 508 is a boolean value. This boolean value 540 generallyserves as a mask enabling “AND” operator to generate an appropriateadjustment value 536. “AND” operator 510 performs a bit-wise logical ANDof OP1 532 and OP2 534, and returns an adjustment value 536. Theadjustment value 536 is represented in a standard integer format and hasa value of either 0 or 1.

[0061] Integer subtraction operator 512 has two operands 542, 544, andsubtracts the second operand (OP2) 544 from the first operand (OP1) 542.The integral portion 516 of input value is fed into SUBTRACT operator512 as the first operand (OP1) 542 of the operator 512. It should benoted that the integral portion 516 is represented in a standard integerformat. It should also be noted that the value that is fed into OP1 542of integer subtraction operator 512 is the output of INT-to-FP converter504.

[0062] The output 536 of “AND” operator 510 is fed into integersubtraction operator 512 as the second operand (OP2) 544 of thesubtraction operator 512. As stated above, the output 536 of “AND”operator 510 is an adjustment value of either 0 or 1.

[0063] Accordingly, integer subtraction operator 512 subtracts theadjustment value 536 from the integral portion 516 of input value, andreturns the output value, └X┘_(integer) 546. In short, └X┘_(integer) 546is the result of a “rounding to minus infinity (−∞)” operation performedon the input value 514, and is represented in a standard integer format.

[0064] Turning now to FIG. 6, this figure is a block diagram of anapparatus 600 for approximating the term 2^(ΔX) using the first five (5)elements of the Taylor series in accordance with one embodiment of thepresent invention.

[0065] The term 2^(ΔX) can expressed using the Taylor series in thefollowing equation (5): $\begin{matrix}{2^{\Delta \quad X} = {\sum\limits_{N = 0}^{\infty}\quad \frac{\left( {\Delta \quad X\quad \ln \quad 2} \right)^{\underset{\_}{N}}}{N!}}} & (5)\end{matrix}$

[0066] Horner's method can be used to approximate 2^(ΔX) as expressedusing the Taylor series in equation (5). When Horner's method is used tocalculate the sum of the first five elements of the series in equation(5), the term 2^(ΔX) can be expressed using the following equation (6):$\begin{matrix}{{{2^{\Delta \quad X} \cong {1 + {\Delta \quad X \times \left( {{\ln \quad 2} + {\Delta \quad X \times \left( {\frac{\ln^{\underset{\_}{2}}2}{2!} + {\Delta \quad X \times \left( {\frac{\ln^{\underset{\_}{3}}2}{3!} + {\Delta \quad X \times \frac{\ln^{\underset{\_}{4}}2}{4!}}} \right)}} \right)}} \right)}}},{{where}\quad \ln^{n}2\quad {denotes}\quad {\left( {\ln \quad 2} \right)^{n}.}}}\quad} & (6)\end{matrix}$

[0067] As stated above, FIG. 6 is a block diagram of an apparatus 600for approximating the term 2^(ΔX) using the first five elements of theseries in equation (5). Apparatus 600 includes floating-pointmultiplication operators and corresponding floating-point additionoperators. Each floating-point multiplication operator and itscorresponding floating-point addition operator work in cooperation tocompute a portion of the sum expressed in equation (6).

[0068] The embodiment illustrated in FIG. 6 shows four floating-pointmultiplication operators and four corresponding floating-point additionoperators needed to approximate 2^(ΔX) using the first five elements ofthe series in equation (5). However, additional floating-pointmultiplication and addition operators can be added to approximationapparatus 600 to approximate 2^(ΔX) using six or more elements of theseries in equation (5) to produce an approximation of 2^(ΔX) that ismore accurate that an approximation that uses five elements.

[0069] Each floating-point multiplication operator has two operands (OP1and OP2) that are floating-point values, performs a floating-pointmultiplication on the operands, and returns the product. Eachfloating-point addition operator has two operands (OP1 and OP2) that arefloating-point values, performs a floating-point addition on theoperands, and returns the sum.

[0070] The values of ΔX and $\frac{\ln^{\underset{\_}{4}}2}{4!},$

[0071]614 and 616 respectively, are represented in floating-pointformat, and are fed into floating-point multiplication operator 610 ₁ asoperands of the operator 610 ₁. In one embodiment supporting singleprecision floating-point data format as defined by IEEE Standard 754,the value 616 of $\frac{\ln^{\underset{\_}{4}}2}{4!}$

[0072] can be represented by the hexadecimal value of 0x 3c1d955b.Operator 610 ₁ performs a floating-point multiplication to generate theproduct 618 of$\left( {\Delta \quad X \times \frac{\ln^{\underset{\_}{4}}2}{4!}} \right).$

[0073] The product 618 generated by floating-point multiplicationoperator 610 ₁ is fed into floating-point addition operator 612 ₁ as oneoperand of the operator 612 ₁. A value 620 of$\frac{\ln^{\underset{\_}{3}}2}{3!}$

[0074] is also fed into floating-point addition operator 612 ₁ asanother operand of the operator 612 ₁. In one embodiment supportingsingle precision floating-point data format as defined by IEEE Standard754, the value 620 of $\frac{\ln^{\underset{\_}{3}}2}{3!}$

[0075] can be represented by the hexadecimal value of 0x 3d635847.Operator 612 ₁ performs a floating-point addition on the values 618,620, and returns the sum 622 of$\left( {\frac{\ln^{\underset{\_}{3}}2}{3!} + {\Delta \quad X \times \frac{\ln^{\underset{\_}{4}}2}{4!}}} \right).$

[0076] The sum 622 generated by floating-point addition operator 612 ₁is fed into floating-point multiplication operator 610 ₂ as one operandof the operator 610 ₂. The value 614 of ΔX is fed into floating-pointmultiplication operator 610 ₂ as another operand of the operator 610 ₂.Operator 610 ₂ performs a floating-point multiplication on the values614, 622 to generate the product 624 of$\left( {\Delta \quad X \times \left( {\frac{\ln^{\underset{\_}{3}}2}{3!} + {\Delta \quad X \times \frac{\ln^{\underset{\_}{4}}2}{4!}}} \right)} \right).$

[0077] The product 624 generated by floating-point multiplicationoperator 610 ₂ is fed into floating-point addition operator 612 ₂ as oneoperand of the operator 612 ₂. A value 626 of$\frac{\ln^{\underset{\_}{2}}2}{2!}$

[0078] is fed into floating-point addition operator 612 ₂ as anotheroperand of the operator 612 ₂. In one embodiment supporting singleprecision floating-point data format as defined by IEEE Standard 754,the value 626 of $\frac{\ln^{\underset{\_}{2}}2}{2!}$

[0079] can be represented by the hexadecimal value of 0x3e75fdf0.Operator 612 ₂ performs a floating-point addition on the values 624,626, and returns the sum 628 of$\left( {\frac{\ln^{\underset{\_}{2}}2}{2!} + {\Delta \quad X \times \left( {\frac{\ln^{\underset{\_}{3}}2}{3!} + {\Delta \quad X \times \frac{\ln^{\underset{\_}{4}}2}{4!}}} \right)}} \right).$

[0080] The sum 628 generated by floating-point addition operator 612 ₂is fed into floating-point multiplication operator 610 ₃ as one operandof the operator 610 ₃. The value 614 of ΔX is fed into floating-pointmultiplication operator 610 ₃ as another operand of the operator 610 ₃.Operator 610 ₃ performs a floating-point multiplication on the values614, 628 to generate the product 630 of$\left( {\Delta \quad X \times \left( {\frac{\ln^{\underset{\_}{2}}2}{2!} + {\Delta \quad X \times \left( {\frac{\ln^{\underset{\_}{3}}2}{3!} + {\Delta \quad X \times \frac{\ln^{\underset{\_}{4}}2}{4!}}} \right)}} \right)} \right).$

[0081] The product 630 generated by floating-point multiplicationoperator 610 ₃ is fed into floating-point addition operator 612 ₃ as oneoperand of the operator 612 ₃. A value 632 of ln 2 is fed intofloating-point addition operator 612 ₃ as another operand of theoperator 612 ₃. In one embodiment supporting single precisionfloating-point data format as defined by IEEE Standard 754, the value632 of ln 2 can be represented by the hexadecimal value of 0x3f317218.Operator 612 ₃ performs a floating-point addition on the values 630,632, and returns the sum 634 of$\left( {{\ln \quad 2} + {\Delta \quad X \times \left( {\frac{\ln^{\underset{\_}{2}}2}{2!} + {\Delta \quad X \times \left( {\frac{\ln^{\underset{\_}{3}}2}{3!} + {\Delta \quad X \times \frac{\ln^{\underset{\_}{4}}2}{4!}}} \right)}} \right)}} \right).$

[0082] The sum 634 generated by floating-point addition operator 612 ₃is fed into floating-point multiplication operator 610 ₄ as one operandof the operator 610 ₄. The value 614 of ΔX is fed into floating-pointmultiplication operator 610 ₄ as another operand of the operator 610 ₄.Operator 610 ₄ performs a floating-point multiplication on the values614, 634 to generate the product 636 of$\left( {\Delta \quad X \times \left( {{\ln \quad 2} + {\Delta \quad X \times \left( {\frac{\ln^{\underset{\_}{2}}2}{2!} + {\Delta \quad X \times \left( {\frac{\ln^{\underset{\_}{3}}2}{3!} + {\Delta \quad X \times \frac{\ln^{\underset{\_}{4}}2}{4!}}} \right)}} \right)}} \right)} \right).$

[0083] The product 636 generated by floating-point multiplicationoperator 610 ₄ is fed into floating-point addition operator 612 ₄ as oneoperand of the operator 612 ₄. A real value 638 of 1.0 is fed intofloating-point addition operator 612 ₄ as another operand of theoperator 612 ₄. In one embodiment supporting single precisionfloating-point data format as defined by IEEE Standard 754, the value638 of 1.0 can be represented by the hexadecimal value of 0x3f800000.Operator 612 ₄ performs a floating-point addition on the values 636,638, and returns the sum 640 of$\left( {1 + {\Delta \quad X \times \left( {{\ln \quad 2} + {\Delta \quad X \times \left( {\frac{\ln^{\underset{\_}{2}}2}{2!} + {\Delta \quad X \times \left( {\frac{\ln^{\underset{\_}{3}}2}{3!} + {\Delta \quad X \times \frac{\ln^{\underset{\_}{4}}2}{4!}}} \right)}} \right)}} \right)}} \right).$

[0084] The sum 640 generated by addition operator 610 is anapproximation of 2^(ΔX) as defined in equation (6).

[0085] It should be noted that the above description of the2^(ΔX)-approximation apparatus 600 was given in the context ofsupporting the single precision floating-point data format as defined byIEEE Standard 754. Therefore, hexadecimal values of 0x3f800000,3xc1d955b, 0x3d635847, 0x3e75fdf0, and 0x3f317218 were used to representthe values of 1.0$\frac{\ln^{\underset{\_}{2}}2}{2!},\frac{\ln^{\underset{\_}{3}}2}{3!},{{and}\quad \frac{\ln^{\underset{\_}{4}}2}{4!}},$

[0086] respectively. However, a different set of hexadecimal values canbe used to support floating-point formats other than the singleprecision floating-point format defined by IEEE Standard 754.

[0087] It should also be noted that the functional components, as shownin FIGS. 3A, 3B, 5, and 6 and described in the text accompanying thefigures, can be implemented in hardware. The functional components canalso be implemented using software code segments, where each of the codesegments includes one or more assembly instructions. If the functionalcomponents are implemented using software code segments, these codesegments can be stored on a machine-readable medium, such as floppydisk, hard drive, CD-ROM, DVD, tape, memory, or any storage device thatis accessible by a machine.

[0088]FIG. 7 is a flow diagram that generally outlines the process 700of approximating the term 2^(X) in accordance with one embodiment of thepresent invention. An input value X is accepted in block 705. The inputvalue is a real number and is represented in floating-point format.

[0089] In block 710, the input value X is rounded using the “floor” or“round to minus infinity (−∞)” technique. The rounded value of X isdenoted as └X┘_(integer) and is represented in a standard integerformat. └X┘_(integer) is then converted to floating-point format,denoted └X┘_(floating-point) (block 715). A value of ΔX is calculated bysubtracting └X┘_(floating-point) from input value X (block 720). AfterΔX is calculated, an approximation of 2^(ΔX) is calculated in block 725using the aforementioned equations (5) and (6).

[0090] In block 730, a bit-wise left shift operation is performed on└X┘_(integer) to shift └X┘_(integer) to the left by a predeterminednumber of bit positions, so that the value of └X┘_(integer) is alignedwith the exponent of the approximation of 2^(ΔX). In one embodiment,└X┘_(integer) is shifted to the left by twenty-three bits to support thesingle precision floating-point data format as defined by IEEE Standard754. In this embodiment, bits 23 to 31 of the shifted └X┘_(integer)value contain └X┘_(integer), and bits 0 to 22 of the shifted└X┘_(integer) value 336 contain a value of zero (0). For additionaldetails, see FIGS. 4A and 4B and the description of these figures.

[0091] In block 735, the shifted └X┘_(integer) value is added to theapproximation of 2^(ΔX) using an integer addition operation.Accordingly, └X┘_(integer) is added to the exponent of the approximationof 2^(ΔX). By adding └X┘_(integer) to the exponent of the approximationof 2^(ΔX), an approximation of 2^(X) is calculated in accordance toprinciples set forth in the aforementioned equation (3).

[0092] While certain exemplary embodiments have been described and shownin accompanying drawings, it is to be understood that such embodimentsare merely illustrative of and not restrictive on the broad invention,and that this invention not be limited to the specific constructions andarrangements shown and described, since various other modifications mayoccur to those ordinarily skilled in the art.

What is claimed is:
 1. A computing system, comprising: a firstapproximation apparatus to approximate the term2^(X, wherein X is a real number;) a memory to store a computer programthat utilizes the first approximation apparatus; and a centralprocessing unit (CPU) to execute the computer program, the CPU iscooperatively connected to the first approximation apparatus and thememory.
 2. The system of claim 1, wherein the first approximationapparatus includes: a rounding apparatus to accept an input value (X)that is a real number represented in floating-point format, and tocompute a rounded value (└X┘_(integer)) by rounding the input value (X)toward minus infinity, wherein the rounded value (└X┘_(integer)) isrepresented in an integer format.
 3. The system of claim 1, wherein thefirst approximation apparatus includes: an integer-to-floating-pointconverter to accept as input a first rounded value (└X┘_(integer))represented in an integer format, and to convert the first rounded value(└X┘_(integer)) to a second rounded value (└X┘_(floating-point))represented in floating-point format.
 4. The system of claim 1, whereinthe first approximation apparatus includes: a floating-point subtractionoperator to compute the difference between an input value (X) and└X┘_(floating-point) which is the input value (X) rounded toward minusinfinity and is represented in floating-point format.
 5. The system ofclaim 1, wherein the first approximation apparatus includes a shift-leftlogical operator to generate a shifted └X┘_(integer) value by shifting arounded value (└X┘_(integer)) to the left by a predetermined number ofbit positions.
 6. The system of claim 1, wherein the first approximationapparatus includes: a second approximation apparatus to accept ΔX asinput, to approximate 2^(ΔX), and to return an approximation of 2^(ΔX),wherein ΔX=X−└X┘_(floating-point) and └X┘_(floating-point) is the inputvalue (X) rounded toward minus infinity and is represented infloating-point format.
 7. The system of claim 6, wherein the secondapproximation apparatus computes the approximation of 2^(ΔX) by applyingHorner's method in calculating a sum of a plurality of elements of aseries in the equation$2^{\Delta \quad X} = {\sum\limits_{N = 0}^{\infty}\quad {\frac{\left( {\Delta \quad X\quad \ln \quad 2} \right)^{\underset{\_}{N}}}{N!}.}}$


8. The system of claim 1, wherein the first approximation apparatusincludes: an integer addition operator to accept a shifted └X┘_(integer)value and an approximation of 2^(ΔX) as input, and to perform an integeraddition operation on the shifted └X┘_(integer) value and theapproximation of 2^(ΔX) to generate an approximation of 2^(X), whereinΔX=X−└X┘_(floating-point) and └X┘_(floating-point) is the input value(X) rounded toward minus infinity and is represented in floating-pointformat.
 9. The system of claim 1, further comprising: a thirdapproximation apparatus to approximate a term C^(Z), wherein C is aconstant and a positive number and Z is a real number, the thirdapproximation apparatus using a floating-point multiplication operatorto compute a product of log₂ C×Z, and feeding the product of log₂ C×Zinto the first approximation apparatus to generate an approximation ofC^(Z).
 10. A method comprising: generating a first rounded value and asecond rounded value; subtracting the second rounded value from an inputvalue (X) to generate ΔX; generating an approximation of 2^(ΔX);performing a bit-wise left shift to the first rounded value to generatea shifted value; and approximating 2^(X) by performing an integeraddition operation to add the shifted value to the approximation of2^(ΔX).
 11. The method of claim 10, wherein generating the first roundedvalue comprises: rounding an input value (X) downward to generate thefirst rounded value represented in an integer format.
 12. The method ofclaim 10, wherein generating the second rounded value comprises:converting the first rounded value represented in an integer format tothe second rounded value represented in floating-point format.
 13. Themethod of claim 10, wherein generating an approximation of 2^(ΔX)comprises: applying Horner's method in calculating a sum of a pluralityof elements of a series in the equation$2^{\Delta \quad X} = {\sum\limits_{N = 0}^{\infty}\quad {\frac{\left( {\Delta \quad X\quad \ln \quad 2} \right)^{\underset{\_}{N}}}{N!}.}}$


14. The method of claim 10, wherein performing a bit-wise left shiftoperation to the first rounded value comprises: shifting the firstrounded value to the left by a predetermined number of bit positions sothat the first rounded value occupies bit positions reserved for anexponent of a floating-point value.
 15. The method of claim 10, whereinapproximating 2^(X) comprises: performing an integer addition operationto add the shifted value to the approximation of 2^(ΔX), such that thefirst rounded value is added to an exponent value of the approximationof 2^(ΔX).
 16. A machine-readable medium comprising instructions which,when executed by a machine, cause the machine to perform operationscomprising: a first code segment to perform computations to approximatethe term 2^(X), wherein X is a real number.
 17. The machine-readablemedium of claim 16, wherein the first approximation apparatus includes:a second code segment to accept an input value (X) that is a real numberrepresented in floating-point format, to compute a rounded value(└X┘_(integer)) by rounding the input value (X) toward minus infinity,and to return the rounded value (└X┘_(integer)) which is represented inan integer format.
 18. The machine-readable medium of claim 17, whereinthe second code segment computes the approximation of 2^(ΔX) by applyingHorner's method in calculating a sum of a plurality of elements of aseries in the following equation,$2^{\Delta \quad X} = {\sum\limits_{N = 0}^{\infty}{\frac{\left( {\Delta \quad X\quad \ln \quad 2} \right)^{\underset{\_}{N}}}{N!}.}}$


19. The machine-readable medium of claim 16, wherein the first codesegment includes: a third code segment to accept ΔX as input and togenerate an approximation of 2^(ΔX), wherein ΔX=X−└X┘_(floating-point)and └X┘_(floating-point) is the input value (X) rounded and isrepresented in floating-point format.
 20. The machine-readable medium ofclaim 16, wherein the first code segment includes: a fourth code segmentto accept a shifted └X┘_(integer) value and an approximation of 2^(ΔX)as input, and to generate an approximation 2^(X) by performing aninteger addition operation on the shifted └X┘_(integer) value and theapproximation of 2^(ΔX), wherein ΔX=X−└X┘_(floating-point) and└X┘_(floating-point) is the input value (X) rounded and is representedin floating-point format.
 21. The machine-readable medium of claim 16,further includes: a fifth code segment to approximate a term C^(Z),wherein C is a constant and a positive number and Z is a real number,the fifth code segment computing a product of log₂ C−Z and feeding theproduct of log₂ C−Z into the first code segment to generate anapproximation of C^(Z).